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Abstract 

We consider a task graph to be executed on a set of processors. We assume that the mapping 
is given, say by an ordered list of tasks to execute on each processor, and we aim at optimizing 
the energy consumption while enforcing a prescribed bound on the execution time. While it 
is not possible to change the allocation of a task, it is possible to change its speed. Rather 
than using a local approach such as backfilling, we consider the problem as a whole and study 
the impact of several speed variation models on its complexity. For continuous speeds, we 
give a closed-form formula for trees and series-parallel graphs, and we cast the problem into 
a geometric programming problem for general directed acyclic graphs. We show that the 
classical dynamic voltage and frequency scaling (DVFS) model with discrete modes leads to a 
NP-complete problem, even if the modes are regularly distributed (an important particular case 
in practice, which we analyze as the incremental model). On the contrary, the VDD-hopping 
model leads to a polynomial solution. Finally, we provide an approximation algorithm for the 
incremental model, which we extend for the general DVFS model. 



1 Introduction 



The energy consumption of computational platforms has recently become a critical problem, both 
for economic and environmental reasons [|25l . As an example, the Earth Simulator requires about 
12 MW (Mega Watts) of peak power, and PetaFlop systems may require 100 MW of power, nearly 
the output of a small power plant (300 MW). At $100 per MW.Hour, peak operation of a PetaFlop 
machine may thus cost $10,000 per hour [[121 . Current estimates state that cooling costs $1 to $3 
per watt of heat dissipated OTI . This is just one of the many economical reasons why energy- 
aware scheduling has proved to be an important issue in the past decade, even without considering 
battery-powered systems such as laptops and embedded systems. As an example, the Green500 

A two-page extended abstract of this work appeared as a short presentation in SPAA' 201 1, while the long version 
has been accepted for publication in "Concurrency and Computation: Practice and Experience". 
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list ( www . green500 . org ) provides rankings of the most energy-efficient supercomputers in the 
world, therefore raising even more awareness about power consumption. 

To help reduce energy dissipation, processors can run at different speeds. Their power con- 
sumption is the sum of a static part (the cost for a processor to be turned on) and a dynamic part, 
which is a strictly convex function of the processor speed, so that the execution of a given amount 
of work costs more power if a processor runs in a higher mode [fl~5l . More precisely, a processor 
running at speed s dissipates s 3 watts |[T7ll28H 7ll2l[T0l per time-unit, hence consumes s 3 x d joules 
when operated during d units of time. Faster speeds allow for a faster execution, but they also lead 
to a much higher (supra-linear) power consumption. 

Energy-aware scheduling aims at minimizing the energy consumed during the execution of the 
target application. Obviously, it makes sense only if it is coupled with some performance bound 
to achieve, otherwise, the optimal solution always is to run each processor at the slowest possible 
speed. 

In this paper, we investigate energy-aware scheduling strategies for executing a task graph on 
a set of processors. The main originality is that we assume that the mapping of the task graph 
is given, say by an ordered list of tasks to execute on each processor. There are many situations 
in which this problem is important, such as optimizing for legacy applications, or accounting for 
affinities between tasks and resources, or even when tasks are pre-allocated [291, for example for 
security reasons. In such situations, assume that a list-schedule has been computed for the task 
graph, and that its execution time should not exceed a deadline D. We do not have the freedom to 
change the assignment of a given task, but we can change its speed to reduce energy consumption, 
provided that the deadline D is not exceeded after the speed change. Rather than using a local 
approach such as backfilling [|32l 1271 . which only reclaims gaps in the schedule, we consider the 
problem as a whole, and we assess the impact of several speed variation models on its complexity. 
More precisely, we investigate the following models: 

CONTINUOUS model. Processors can have arbitrary speeds, and can vary them continuously: this 
model is unrealistic (any possible value of the speed, say Ve , cannot be obtained) but it is 
theoretically appealing 0. A maximum speed, s max , cannot be exceeded. 

Discrete model. Processors have a discrete number of predefined speeds (or frequencies), which 
correspond to different voltages that the processor can be subjected to 11261 . Switching fre- 
quencies is not allowed during the execution of a given task, but two different tasks scheduled 
on a same processor can be executed at different frequencies. 

Vdd-Hopping model. This model is similar to the Discrete one, except that switching modes 
during the execution of a given task is allowed: any rational speed can be simulated, by 
simply switching, at the appropriate time during the execution of a task, between two con- 
secutive modes [|24l . 

Incremental model. In this variant of the Discrete model, we introduce a value 5 that cor- 
responds the minimum permissible speed increment, induced by the minimum voltage in- 
crement that can be achieved when controlling the processor CPU. This new model aims at 
capturing a realistic version of the Discrete model, where the different modes are spread 
regularly instead of arbitrarily chosen. 
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Our main contributions are the following. For the Continuous model, we give a closed-form 
formula for trees and series-parallel graphs, and we cast the problem into a geometric program- 
ming problem [0 for general DAGs. For the Vdd-Hopping model, we show that the optimal 
solution for general DAGs can be computed in polynomial time, using a (rational) linear program. 
Finally, for the Discrete and Incremental models, we show that the problem is NP-complete. 
Furthermore, we provide approximation algorithms which rely on the polynomial algorithm for the 
Vdd-Hopping model, and we compare their solution with the optimal Continuous solution. 

The paper is organized as follows. We start with a survey of related literature in Section |2] 
We then provide the formal description of the framework and of the energy models in Section |3] 
together with a simple example to illustrate the different models. The next two sections constitute 
the heart of the paper: in Section |4] we provide analytical formulas for continuous speeds, and 
the formulation into the convex optimization problem. In Section |5l we assess the complexity of 
the problem with all the discrete models: Discrete, Vdd-Hopping and Incremental, and we 
discuss approximation algorithms. Finally we conclude in Section[6] 

2 Related work 

Reducing the energy consumption of computational platforms is an important research topic, and 
many techniques at the process, circuit design, and micro-architectural levels have been pro- 
posed [|23ll2Tl[T4| . The dynamic voltage and frequency scaling (DVFS) technique has been exten- 
sively studied, since it may lead to efficient energy/performance trade-offs |[T^[T2l l3ll9l l20ll34ll32l . 
Current microprocessors (for instance, from AMD [1] and Intel |[T6ll ) allow the speed to be set dy- 
namically. Indeed, by lowering supply voltage, hence processor clock frequency, it is possible to 
achieve important reductions in power consumption, without necessarily increasing the execution 
time. We first discuss different optimization problems that arise in this context. Then we review 
energy models. 

2.1 DVFS and optimization problems 

When dealing with energy consumption, the most usual optimization function consists in mini- 
mizing the energy consumption, while ensuring a deadline on the execution time (i.e., a real-time 
constraint), as discussed in the following papers. 

In 11261 . Okuma et al. demonstrate that voltage scaling is far more effective than the shutdown 
approach, which simply stops the power supply when the system is inactive. Their target processor 
employs just a few discretely variable voltages. De Langen and Juurlink [1221 discuss leakage- 
aware scheduling heuristics which investigate both DVS and processor shutdown, since static 
power consumption due to leakage current is expected to increase significantly. Chen et al. (HI 
consider parallel sparse applications, and they show that when scheduling applications modeled 
by a directed acyclic graph with a well-identified critical path, it is possible to lower the voltage 
during non-critical execution of tasks, with no impact on the execution time. Similarly, Wang et 
al. 11321 study the slack time for non-critical jobs, they extend their execution time and thus re- 
duce the energy consumption without increasing the total execution time. Kim et al. [20] provide 
power-aware scheduling algorithms for bag-of-tasks applications with deadline constraints, based 
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on dynamic voltage scaling. Their goal is to minimize power consumption as well as to meet the 
deadlines specified by application users. 

For real-time embedded systems, slack reclamation techniques are used. Lee and Sakurai [|2~3~1 
show how to exploit slack time arising from workload variation, thanks to a software feedback 
control of supply voltage. Prathipati [27] discusses techniques to take advantage of run-time vari- 
ations in the execution time of tasks; it determines the minimum voltage under which each task 
can be executed, while guaranteeing the deadlines of each task. Then, experiments are conducted 
on the Intel StrongArm SA-1100 processor, which has eleven different frequencies, and the Intel 
PXA250 XScale embedded processor with four frequencies. In 11331 . the goal of Xu et al. is to 
schedule a set of independent tasks, given a worst case execution cycle (WCEC) for each task, and 
a global deadline, while accounting for time and energy penalties when the processor frequency is 
changing. The frequency of the processor can be lowered when some slack is obtained dynami- 
cally, typically when a task runs faster than its WCEC. Yang and Lin |[34l discuss algorithms with 
preemption, using DVS techniques; substantial energy can be saved using these algorithms, which 
succeed to claim the static and dynamic slack time, with little overhead. 

Since an increasing number of systems are powered by batteries, maximizing battery life also 
is an important optimization problem. Battery-efficient systems can be obtained with similar tech- 
niques of dynamic voltage and frequency scaling, as described by Lahiri et al. in lETI . Another 
optimization criterion is the energy-delay product, since it accounts for a trade-off between perfor- 
mance and energy consumption, as for instance discussed by Gonzalez and Horowitz in [13]. We 
do not discuss further these latter optimization problems, since our goal is to minimize the energy 
consumption, with a fixed deadline. 

In this paper, the application is a task graph (directed acyclic graph), and we assume that the 
mapping, i.e., an ordered list of tasks to execute on each processor, is given. Hence, our problem is 
closely related to slack reclamation techniques, but instead on focusing on non-critical tasks as for 
instance in 11321 , we consider the problem as a whole. Our contribution is to perform an exhaustive 
complexity study for different energy models. In the next paragraph, we discuss related work on 
each energy model. 

2.2 Energy models 

Several energy models are considered in the literature, and they can all be categorized in one 
of the four models investigated in this paper, i.e., Continuous, Discrete, Vdd-Hopping or 
Incremental. 

The Continuous model is used mainly for theoretical studies. For instance, Yao et al. [|35l . 
followed by Bansal et al. (3), aim at scheduling a collection of tasks (with release time, deadline 
and amount of work), and the solution is the time at which each task is scheduled, but also, the 
speed at which the task is executed. In these papers, the speed can take any value, hence following 
the Continuous model. 

We believe that the most widely used model is the DISCRETE one. Indeed, processors have 
currently only a few discrete number of possible frequencies 03 [16l |26[ |27]|. Therefore, most 
of the papers discussed above follow this model. Some studies exploit the continuous model to 
determine the smallest frequency required to run a task, and then choose the closest upper discrete 
value, as for instance [27] and 11361 . 
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Recently, a new local dynamic voltage scaling architecture has been developed, based on the 
Vdd-Hopping model ll24l HO. It was shown in [23J that significant power can be saved by 
using two distinct voltages, and architectures using this principle have been developed (see for 
instance lfT9lO . Compared to traditional power converters, a new design with no needs for large 
passives or costly technological options has been validated in a STMicroelectronics CMOS 65nm 
low-power technology [1241 . 

To the best of our knowledge, this paper introduces the Incremental model for the first 
time. The main rationale is that future technologies may well have an increased number of pos- 
sible frequencies, and these will follow a regular pattern. For instance, note that the SA-1100 
processor, considered in [27], has eleven frequencies which are equidistant, i.e., they follow the 
Incremental model. Lee and Sakurai [23] exploit discrete levels of clock frequency as /, f/2, 
//3, where / is the master (i.e., the higher) system clock frequency. This model is closer to the 
Discrete model, although it exhibits a regular pattern similarly to the Incremental model. 

Our work is the first attempt to compare these different models: on the one hand, we assess 
the impact of the model on the problem complexity (polynomial vs NP-hard), and on the other 
hand, we provide approximation algorithms building upon these results. The closest work to ours 
is the paper by Zhang et al. [1361 . in which the authors also consider the mapping of directed acyclic 
graphs, and compare the Discrete and the Continuous models. We go beyond their work in 
this paper, with an exhaustive complexity study, closed-form formulas for the continuous model, 
and the comparison with the Vdd-Hopping and INCREMENTAL models. 

3 Framework 

First we detail the optimization problem in Section 13.11 Then we describe the four energy models 
in Section 13.21 Finally, we illustrate the models and motivate the problem with an example in 
Section [331 

3.1 Optimization problem 

Consider an application task graph Q = (V, £), withn = \V\ tasks denoted as V = T 2 , . . . , T n }, 
and where the set 8 denotes the precedence edges between tasks. Task Tj has a cost Wi for 
1 < i < n. We assume that the tasks in Q have been allocated onto a parallel platform made 
up of identical processors. We define the execution graph generated by this allocation as the graph 
G = (V, E), with the following augmented set of edges: 

• 8 C E: if an edge exists in the precedence graph, it also exists in the execution graph; 

• ifTi and T 2 are executed successively, in this order, on the same processor, then (Ti,T 2 ) G E. 

The goal is to the minimize the energy consumed during the execution while enforcing a dead- 
line D on the execution time. We formalize the optimization problem in the simpler case where 
each task is executed at constant speed. This strategy is optimal for the CONTINUOUS model (by 
a convexity argument) and for the Discrete and Incremental models (by definition). For the 
Vdd-Hopping model, we reformulate the problem in Section 15.11 Let di be the duration of the 
execution of task 7$, U its completion time, and Sj the speed at which it is executed. We obtain the 
following formulation of the MinEnergy(C7, D) problem, given an execution graph G = (V, E) 
and a deadline D; the Sj values are variables, whose values are constrained by the energy model 
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(see Section l3?2l) . 



Minimize 



subject to (i) 
(ii) 
(iii) 



Wi = Si x di for each task T; L G V 

U + dj < tj for each edge (T i} I}) G E 

U<D for each task T { eV 



(1) 



Constraint (i) states that the whole task can be executed in time di using speed Sj. Constraint (ii) 
accounts for all dependencies, and constraint (iii) ensures that the execution time does not exceed 
the deadline D. The energy consumed throughout the execution is the objective function. It is the 
sum, for each task, of the energy consumed by this task, as we detail in the next section. Note that 
di = Wi/si, and therefore the objective function can also be expressed as s l x w %- 

3.2 Energy models 

In all models, when a processor operates at speed s during d time-units, the corresponding con- 
sumed energy is s 3 x d, which is the dynamic part of the energy consumption, following the 
classical models of the literature [[171 l28l [7] [10). Note that we do not take static energy into 
account, because all processors are up and alive during the whole execution. We now detail the 
possible speed values in each energy model, which should be added as a constraint in Equation dU). 

• In the Continuous model, processors can have arbitrary speeds, from to a maximum 
value s max , and a processor can change its speed at any time during execution. 

• In the Discrete model, processors have a set of possible speed values, or modes, denoted 
as Si, s m . There is no assumption on the range and distribution of these modes. The speed 
of a processor cannot change during the computation of a task, but it can change from task 
to task. 

• In the Vdd-Hopping model, a processor can run at different speeds s±, s m , as in the 
previous model, but it can also change its speed during a computation. The energy consumed 
during the execution of one task is the sum, on each time interval with constant speed s, of 
the energy consumed during this interval at speed s. 

• In the Incremental model, we introduce a value 5 that corresponds to the minimum per- 
missible speed (i.e., voltage) increment. That means that possible speed values are obtained 
as s = s min + i x S, where i is an integer such that < i < Smax ~ Sm ™ . Admissible speeds 
lie in the interval [s min , s max }. This new model aims at capturing a realistic version of the 
Discrete model, where the different modes are spread regularly between s\ = s min and 
s m — s max , instead of being arbitrarily chosen. It is intended as the modern counterpart of a 
potentiometer knob ! 

3.3 Example 

Consider an application with four tasks of costs w\ = 3, w 2 = 2, W3 = 1 and w 4 = 2, and 
one precedence constraint T\ — V T 3 . We assume that T\ and T 2 are allocated, in this order, onto 
processor P%, while T 3 and T 4 are allocated, in this order, on processor P 2 . The resulting execution 
graph G is given in Figure \T\ with two precedence constraints added to the initial task graph. The 
deadline on the execution time is D = 1.5. 

We set the maximum speed to s max = 6 for the CONTINUOUS model. For the DISCRETE and 
Vdd-Hopping models, we use the set of speeds = 2, = 5 and = 6. Finally, for 
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Figure 1 : Execution graph for the example. 



the Incremental model, we set 5 = 2, s r 



2 and s r 



4" 



2, s 2 = 4 and s^' = 6. We aim at finding the optimal execution speed s$ for each task Tj 



,(0 



6, so that possible speeds are 



(1 < i < 4), i.e., the values of Sj which minimize the energy consumption. 

With the Continuous model, the optimal speeds are non rational values, and we obtain 



81 = ^(3 + 35 1/3 ) -4.18; s 2 



Si x 



35V3 



2.56; S3 = S4 = si x 



3.83. 



Note that all speeds are lower than the maximum s r 



35V3 

These values are obtained thanks to 



_ A Wi X s| 



the formulas derived in Section |4] The energy consumption is then E^ t = ^\ 
3.sf + 2. si + 3.s§ ~ 109.6. The execution time is ^ + max ^^J^), and with this solution, 

it is equal to the deadline D (actually, both processors reach the deadline, otherwise we could slow 
down the execution of one task). 



For the Discrete model, if we execute all tasks at speed s 



(J) 



x 5 2 



200. A better solution is obtained with si 



5, we obtain an energy 



6, s 2 



s 3 



2 and 



E -- 

s 4 = s ( 2 d) = 5, which turns out to be optimal: E^ t = 3x36 + (2 + l)x4 + 2x 25 = 170. 

Note that E^ t > eQ, i.e., the optimal energy consumption with the Discrete model is much 
higher than the one achieved with the CONTINUOUS model. Indeed, in this case, even though the 
first processor executes during 3/6 + 2/2 = D time units, the second processor remains idle since 
3/6 + 1/2 + 2/5 = 1.4 < D. The problem turns out to be NP-hard (see Section US, and the 
solution has been found by performing an exhaustive search. 

With the Vdd-Hopping model, we set si = sf" 1 = 5; for the other tasks, we run part of the 



time at speed s% = 5, and part of the time at speed = 2 in order to use the idle time and 
lower the energy consumption. T 2 is executed at speed during time | and at speed s 2 during 
time (i.e., the first processor executes during time 3/5 + 5/6 + 2/30 = 1.5 = D, and all the 

work for T 2 is done: 2x5/6 + 5x2/30 = 2 = w 2 ). T 3 is executed at speed s 2 (during time 
1/5), and finally T 4 is executed at speed s± during time 0.5 and at speed s 2 during time 1/5 (i.e., 
the second processor executes during time 3/5 + 1/5 + 0.5 + 1/5 = 1.5 = D, and all the work 
for T 4 is done: 2x0.5 + 5x 1/5 = 2 = w 4 ). This set of speeds turns out to be optimal (i.e., it is 
the optimal solution of the linear program introduced in Section [54]), with an energy consumption 
E$ = (3/5 + 2/30 + 1/5 + 1/5) x 5 3 + (5/6 + 0.5) x 2 3 = 144. As expected, < E^ t < E { Q f t , 
i.e., the Vdd-Hopping solution stands between the optimal Continuous solution, and the more 
constrained Discrete solution. 

For the Incremental model, the reasoning is similar to the Discrete case, and the optimal 

(i) 

solution is obtained by an exhaustive search: all tasks should be executed at speed s 2 =4, with 



(d) 
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an energy consumption E^ t = 8 x 4 2 = 128 > E^ t . It turns out to be better than DISCRETE and 
Vdd-Hopping, since it has different discrete values of energy which are more appropriate for this 
example. 

4 The Continuous model 

With the Continuous model, processor speeds can take any value between and s max . First we 
prove that, with this model, the processors do not change their speed during the execution of a task 
(Section |4~TT) . Then, we derive in Section H31 the optimal speed values for special execution graph 
structures, expressed as closed form algebraic formulas, and we show that these values may be 
irrational (as already illustrated in the example in Section [331) . Finally, we formulate the problem 
for general DAGs as a convex optimization program in Section |4~3l 

4.1 Preliminary lemma 

Lemma 1 (constant speed per task). With the CONTINUOUS model, each task is executed at con- 
stant speed, i.e., a processor does not change its speed during the execution of a task. 

Proof. Suppose that in the optimal solution, there is a task whose speed changes during the exe- 
cution. Consider the first time-step at which the change occurs: the computation begins at speed s 
from time t to time t', and then continues at speed s' until time t". The total energy consumption 
for this task in the time interval [t; t"] is E = (f — t)xs 3 + (t" — t') x (s') 3 . Moreover, the amount 
of work done for this task is W = (f — t) x s + (t" — t') x s'. 

If we run the task during the whole interval [t; t"} at constant speed W/ (t"—t), the same amount 
of work is done within the same time. However, the energy consumption during this interval of 
time is now E' = (t" — t) x (W/(t" — t)) 3 . By convexity of the function x >-)■ x 3 , we obtain 
E' < E since t < t' < t". This contradicts the hypothesis of optimality of the first solution, which 
concludes the proof. □ 

4.2 Special execution graphs 
4.2.1 Independent tasks 

Consider the problem of minimizing the energy of n independent tasks (i.e., each task is mapped 
onto a distinct processor, and there are no precedence constraints in the execution graph), while 
enforcing a deadline D. 

Proposition 1 (independent tasks). When G is composed of independent tasks {T±, . . . , T n }, the 
optimal solution to MinEnergy(G, D) is obtained when each taskTi (1 < i < n) is computed at 
speed Si = j^. If there is a task Ti such that Sj > s max , then the problem has no solution. 

Proof. For task Tj, the speed s« corresponds to the slowest speed at which the processor can execute 
the task, so that the deadline is not exceeded. If s- t > s max , the corresponding processor will never 
be able to complete its execution before the deadline, therefore there is no solution. To conclude 
the proof, we note that any other solution would have higher values of Sj because of the deadline 
constraint, and hence a higher energy consumption. Therefore, this solution is optimal. □ 
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4.2.2 Linear chain of tasks 



This case corresponds for instance to n independent tasks {Ti, . . . , T n } executed onto a single 
processor. The execution graph is then a linear chain (order of execution of the tasks), with Tj — > 
T i+1 , for 1 < i < n. 

Proposition 2 (linear chain). When G is a linear chain of tasks, the optimal solution to 
MinEnergy(G, D) is obtained when each task is executed at speed s — ^y, with W = "52i=i w i- 
If s > s max , then there is no solution. 

Proof. Suppose that in the optimal solution, tasks T and Tj are such that Sj < Sj. The total 
energy consumption is E opt . We define s such that the execution of both tasks running at speed s 
takes the same amount of time than in the optimal solution, i.e., (wi + Wj)/s = Wi/s{ + Wj/sf. 
s = w^Xw^s- x SiS r Note that < s < Sj (it is the barycenter of two points with positive mass). 

We consider a solution such that the speed of task T k , for 1 < k < n, with k ^ i and k ^ j, 
is the same as in the optimal solution, and the speed of tasks T and Tj is s. By definition of s, 
the execution time has not been modified. The energy consumption of this solution is E, where 
E opt — E = Wisf + Wjs'j — (wi + Wj)s 2 , i.e., the difference of energy with the optimal solution 
is only impacted by tasks Tj and Tj, for which the speed has been modified. By convexity of the 
function x H- x 2 , we obtain E opt > E, which contradicts its optimality. Therefore, in the optimal 
solution, all tasks have the same execution speed. Moreover, the energy consumption is minimized 
when the speed is as low as possible, while the deadline is not exceeded. Therefore, the execution 
speed of all tasks is s = W/D. □ 

Corollary 1. A linear chain with n tasks is equivalent to a single task of cost W = Y17=i Wi - 

Indeed, in the optimal solution, the n tasks are executed at the same speed, and they can be replaced 
by a single task of cost W , which is executed at the same speed and consumes the same amount of 
energy. 



4.2.3 Fork and join graphs 

Let V — {Ti,...,T n }. We consider either a fork graph G = (VU{T }, E), withE = {(T ,Tj),T 4 e 
V}, or a join graph G = (V U {T }, E), with E = {(Tj, T ),T e V}. T is either the source of 
the fork or the sink of the join. 

Theorem 1 (fork and join graphs). When G is afork(resp. join) execution graph with n + 1 tasks 
To, Ti, . . . , T n , the optimal solution to MinEnergy(G, D) is the following: 

• the execution speed of the source (resp. sink) T is s = 8-1 % ; 

• for the other tasks T, 1 < i < n, we have Sj = s x r ifso < s max . 

Otherwise, T should be executed at speed sq = s max , and the other speeds are = with 
D' = D — m -, if they do not exceed s max (Proposition\J]for independent tasks). Otherwise there 
is no solution. 
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If no speed exceeds s ma x, the corresponding energy consumption is 

3 



minE(G,D 



D 2 

Proof. Let to = 7 s - Then, the source or the sink requires a time to for execution. For 1 < i < n, 
task Ti must be executed within a time D — t so that the deadline is respected. Given to, we can 
compute the speed for task Tj using Theorem [TJ since the tasks are independent: = = 
Wi ■ s d°_ w • The objective is therefore to minimize Y17=o w i s h which is a function of s : 

2 / \—\n 



4 u>+ ,„ ^ r\\ 2 )=f(s ). 



Wo y u v u (so^-^o) 2 , 

Let W 3 = Y^h=i w i- ^ n or der to find the value of s which minimizes this function, we study the 
function /(x), for x > 0. f(x) = 2x (w + {x jZw y* ) -^D-x 2 - {xJ Zl, )^ and therefore f'(x) = 

for x = (W 3 3 + w )/D. We conclude that the optimal speed for task T is s = * — ^' , 

if s < s max . Otherwise, T should be executed at the maximum speed s = s max , since it is the 
bottleneck task. In any case, for 1 < i < n, the optimal speed for task Tj is Sj = Wi 



SO 



1 soD—wo ' 

Finally, we compute the exact expression of minE(G, D) = f(s ), when s Q < s max : 



1 2 / i 

w 3 \ ( Wi +w \ ( W 3 \ _ [ W s + w o 



3 



(s D-w )y \ D ) \W* /3 J D< 



which concludes the proof. □ 

Corollary 2 (equivalent tasks for speed). Consider a fork or join graph with tasks T it < i < n, 
and a deadline D, and assume that the speeds in the optimal solution to MinEnergy (G, D) do 
not exceed s max . Then, these speeds are the same as in the optimal solution for n + 1 independent 
tasks Tq, T{, . . . , T' n , where w' Q = ($2i=i w f)* + w o> and, for 1 < i < n, w[ 



£?=l 



Corollary 3 (equivalent task for energy). Consider a fork or join graph G and a deadline D, and 
assume that the speeds in the optimal solution to MinEnergy (G, D) do not exceed s max . We 
say that the graph G is equivalent to the graph G^ eq \ consisting of a single task of weight 

= (Y^i=i because the minimum energy consumption of both graphs are identical: 

mmE(G,D)=mmE(G( eq \D). 



4.2.4 Trees 

We extend the results on a fork graph for a tree G = (V, E) with |V| = n + 1 tasks. Let T be the 
root of the tree; it has k children tasks, which are each themselves the root of a tree. A tree can 
therefore be seen as a fork graph, where the tasks of the fork are trees. 

The previous results for fork graphs naturally lead to an algorithm that peels off branches of the 
tree, starting with the leaves, and replaces each fork subgraph in the tree, composed of a root T 
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and k children, by one task (as in Corollary |3) which becomes the unique child of T 's parent in the 
tree. We say that this task is equivalent to the fork graph, since the optimal energy consumption 
will be the same. The computation of the equivalent cost of this task is done thanks to a call 
to the eq procedure, while the tree procedure computes the solution to MinEnergy(G, D) (see 
Algorithm d). Note that the algorithm computes the minimum energy for a tree, but it does not 
return the speeds at which each task must be executed. However, the algorithm returns the speed 
of the root task, and it is then straightforward to compute the speed of each children of the root 
task, and so on. 

Theorem 2 (tree graphs). When G is a tree rooted in T (T Q G V, where V is the set of tasks), the 
optimal solution to MinEnergy(G, D) can be computed in polynomial time 0(\V\ 2 ). 

Proof. Let G be a tree graph rooted in T . The optimal solution to MinEnergy (G, D) is obtained 
with a call to tree (G, T , D), and we prove its optimality recursively on the depth of the tree. 
Similarly to the case of the fork graphs, we reduce the tree to an equivalent task which, if executed 
alone within a deadline D, consumes exactly the same amount of energy. The procedure eq is the 
procedure which reduces a tree to its equivalent task (see Algorithm [T|). 

If the tree has depth 0, then it is a single task, eq (G, T ) returns the equivalent cost w , and 
the optimal execution speed is (see Proposition [1]). There is a solution if and only if this speed 

is not greater than s max , and then the corresponding energy consumption is as returned by the 
algorithm. 

Assume now that for any tree of depth i < p, eq computes its equivalent cost, and tree returns 
its optimal energy consumption. We consider a tree G of depth p rooted in T : G = T U {Gi}, 
where each subgraph Gi is a tree, rooted in T i; of maximum depth p — 1. As in the case of forks, 
we know that each subtree Gi has a deadline D — x, where x — and so is the speed at which 
task T is executed. By induction hypothesis, we suppose that each graph Gi is equivalent to a 
single task, T(, of cost w- (as computed by the procedure eq). We can then use the results obtained 
on forks to compute (see proof of Theorem [T): 



w 6 = \ 7 X w i + w o- 



„(e?) 



Finally the tree is equivalent to one task of cost w^ q \ and if < s max , the energy consump- 



tion is K D 2 , and no speed exceeds s max . 

Note that the speed of a task is always greater than the speed of its successors. Therefore, 

w (eq) 

if -Qj- > s max , we execute the root of the tree at speed s max and then process each subtree Gi 
independently. Of course, there is no solution if > D, and otherwise we perform the recursive 
calls to tree to process each subtree independently. Their deadline is then D — ^ S2 -. 

r r J S m ax 

To study the time complexity of this algorithm, first note that when calling tree (G,T ,D), 
there might be at most \ V\ recursive calls to tree, once at each node of the tree. Without accounting 
for the recursive calls, the tree procedure performs one call to the eq procedure, which computes 
the cost of the equivalent task. This eq procedure takes a time 0(|V|), since we have to consider 
the \ V\ tasks, and we add the costs one by one. Therefore, the overall complexity is in 0(|V| 2 ). □ 
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Algorithm 1: Solution to MinEnergy(G, D) for trees. 



procedure tree (tree G, root T , deadline D) 
begin 

Let w=eq (tree G, root T ); 



if — Smax then 
; return |£; 



else 



if 



wo 



> D then 



return Error:No Solution; 



else 



/* T z's executed at speed s max */ 



return w x si__ + 



E 



Gi subtree rooted in Tiechildren(T ) 



tree I Gi,Ti,D- 



W 



end 



end 



end 

procedure eq (tree G, root T ) 
begin 

if children(T )=® then 
| return w ; 

else 



return j ^ (eq(G i , T,))' 5 | •• „■„; 

v Gi subtree rooted in Tiechildren{T ) 



end 



end 
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(a) Two SPGs before composition. 



A 




(b) Parallel composition. (c) Series composition. 

Figure 2: Composition of series-parallel graphs (SPGs). 



4.2.5 Series-parallel graphs 

We can further generalize our results to series-parallel graphs (SPGs), which are built from a 
sequence of compositions (parallel or series) of smaller-size SPGs. The smallest SPG consists of 
two nodes connected by an edge (such a graph is called an elementary SPG). The first node is the 
source, while the second one is the sink of the SPG. When composing two SGPs in series, we 
merge the sink of the first SPG with the source of the second one. For a parallel composition, the 
two sources are merged, as well as the two sinks, as illustrated in Figure |2] 

We can extend the results for tree graphs to SPGs, by replacing step by step the SPGs by an 
equivalent task (procedure cost in Algorithm |2]): we can compute the equivalent cost for a series 
or parallel composition. 

However, since it is no longer true that the speed of a task is always larger than the speed of its 
successor (as was the case in a tree), we have not been able to find a recursive property on the tasks 
that should be set to s max , when one of the speeds obtained with the previous method exceeds s max . 
The problem of computing a closed form for a SPG with a finite value of s max remains open. Still, 
we have the following result when s max = +00: 

Theorem 3 (series-parallel graphs). When G is a SPG, it is possible to compute recursively a 
closed form expression of the optimal solution o/MlNENERGY^, D), assuming s max = +00, in 
polynomial time 0(\V\), where V is the set of tasks. 

Proof. Let G be a series-parallel graph. The optimal solution to MinEnergy(G, D) is obtained 
with a call to SPG (G, D), and we prove its optimality recursively. Similarly to trees, the main 
idea is to peel the graph off, and to transform it until there remains only a single equivalent task 
which, if executed alone within a deadline D, would consume exactly the same amount of energy. 
The procedure cost is the procedure which reduces a tree to its equivalent task (see Algorithmic. 

The proof is done by induction on the number of compositions required to build the graph G, p. 
If p — 0, G is an elementary SPG consisting in two tasks, the source T and the sink T\. It is 
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therefore a linear chain, and therefore equivalent to a single task whose cost is the sum of both 
costs, w + W\ (see Corollary [T] for linear chains). The procedure cost returns therefore the correct 
equivalent cost, and SPG returns the minimum energy consumption. 

Let us assume that the procedures return the correct equivalent cost and minimum energy con- 
sumption for any SPG consisting of i < p compositions. We consider a SPG G, with p composi- 
tions. By definition, G is a composition of two smaller-size SPGs, G\ and G 2 , and both of these 
SPGs have strictly fewer than p compositions. We consider G[ and G' 2 , which are identical to G\ 
and G 2 , except that the cost of their source and sink tasks are set to (these costs are handled 
separately), and we can reduce both of these SPGs to an equivalent task, of respective costs w[ 
and w' 2 , by induction hypothesis. There are two cases: 

• If G is a series composition, then after the reduction of G[ and G' 2 , we have a linear chain in 
which we consider the source T of G\, the sink T\ of G\ (which is also the source of G 2 ), 
and the sink T 2 of G 2 . The equivalent cost is therefore w + w[ + wi + w' 2 + w 2 , thanks to 
Corollary [Qfor linear chains. 

• If G is a parallel composition, the resulting graph is a fork-join graph, and we can use 
Corollaries [Hand |3] to compute the cost of the equivalent task, accounting for the source T 

and the sink T x : w + (Oi) 3 + O2) 3 ) 1 + 
Once the cost of the equivalent task of the SPG has been computed with the call to cost (G), 
the optimal energy consumption is ( cos ^(p)) ^ 

Contrarily to the case of tree graphs, since we never need to call the SPG procedure again 
because there is no constraint on s max , the time complexity of the algorithm is the complexity 
of the cost procedure. There is exactly one call to cost for each composition, and the number 
of compositions in the SPG is in 0(|V|). All operations in cost can be done in 0(1), hence a 
complexity in 0(|V|). □ 



4.3 General DAGs 

For arbitrary execution graphs, we can rewrite the MinEnergy(G, D) problem as follows: 
Minimize Yll=i u 7 2 x w i 

subject to (i) tj + Wj x Uj < tj for each edge (T$, Tj) E E 

(ii) U < D for each task Tj G V ( ' 

(iii) Ui > — !— for each task T { e V 

Smax 

Here, U{ = l/s, is the inverse of the speed to execute task Tj. We now have a convex opti- 
mization problem to solve, with linear constraints in the non-negative variables Ui and tj. In fact, 
the objective function is a posynomial, so we have a geometric programming problem (see [|6] 
Section 4.5]) for which efficient numerical schemes exist. However, as illustrated on simple fork 
graphs, the optimal speeds are not expected to be rational numbers but instead arbitrarily complex 
expressions (we have the cubic root of the sum of cubes for forks, and nested expressions of this 
form for trees). From a computational complexity point of view, we do not know how to encode 
such numbers in polynomial size of the input (the rational task weights and the execution dead- 
line). Still, we can always solve the problem numerically and get fixed- size numbers which are 
good approximations of the optimal values. 
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Algorithm 2: Solution to MinEnergy(G, D) for series-parallel graphs, 
procedure SPG (series-parallel graph G, deadline D) 
begin 

(COSt(G)f 

return A — jy- 2 — '—; 

end 

procedure cost (series-parallel graph G) 
begin 

Let T be the source of G and 7\ its sink; 

if G is composed of only two tasks, T and Ti then 
| return w + wu 

else 

/* G is a composition of two SPGs Gi and G 2 . */ 

For i = l,2, let G\ = Gi where the cost of source and sink tasks is set to 0; 
w[ = costlGi); w' 2 = cost(G2); 
if G is a series composition then 

Let T be the source of Gi, T x be its sink, and T 2 be the sink of G 2 ; 

return w + w[ + wi + w' 2 + w 2 ; 
else 

/* It is a parallel composition. */ 

Let T be the source of G, and T\ be its sink; 

return w + ((u^) 3 + (u> 2 ) 3 )^ + wi, 
end 
end 
end 
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In the following, we show that the total power consumption of any optimal schedule is constant 
throughout execution. While this important property does not help to design an optimal solution, 
it shows that a schedule with large variations in its power consumption is likely to waste a lot of 
energy. 

We need a few notations before stating the result. Consider a schedule for a graph G = (V, E) 
with n tasks. Task Tj is executed at constant speed (see LemmaQ]) and during interval q]: 
Tj begins its execution at time 6, and completes it at time q. The total power consumption P(t) of 
the schedule at time t is defined as the sum of the power consumed by all tasks executing at time t: 

p(t) = £ sf. 

l<i<n, td[bi,Ci\ 

Theorem 4. Consider an instance of CONTINUOUS, and an optimal schedule for this instance, 
such that no speed is equal to s max . Then the total power consumption of the schedule throughout 
execution is constant. 

Proof. We prove this theorem by induction on the number of tasks of the graph. First we prove a 
preliminary result: 

Lemma 2. Consider a graph G = (V, E) with n > 2 tasks, and any optimal schedule of dead- 
line D. Let t\ be the earliest completion time of a task in the schedule. Similarly, let t 2 be the 
latest starting time of a task in the schedule. Then, either G is composed of independent tasks, or 
< h < t 2 < D. 

Proof. Task is executed at speed and during interval [bi, q]. We have ti = min!<j<„ q and 
t 2 = maxi<j<„ bi. Clearly, < t\, t 2 < D by definition of the schedule. Suppose that t 2 <t\. Let 
Ti be a task that ends at time t\, and T 2 one that starts at time t 2 . Then: 

• $T eV, (Ti, T) e E (otherwise, T would start after t 2 ), therefore, t ± = D; 

• $T e V, (T, T 2 ) e E (otherwise, T would finish before h); therefore t 2 = 0. 

This also means that all tasks start at time and end at time D. Therefore, G is only composed of 
independent tasks. □ 

Back to the proof of the theorem, we consider first the case of a graph with only one task. In 
an optimal schedule, the task is executed in time D, and at constant speed (Lemma[[]), hence with 
constant power consumption. 

Suppose now that the property is true for all DAGs with at most n — 1 tasks. Let G be a 
DAG with n tasks. If G is exactly composed of n independent tasks, then we know that the power 
consumption of G is constant (because all task speeds are constant). Otherwise, let t\ be the 
earliest completion time, and t 2 the latest starting time of a task in the optimal schedule. Thanks 
to LemmaEl we have < t\ < t 2 < D. 

Suppose first that t 1 = t 2 = t . There are three kinds of tasks: those beginning at time and 
ending at time t (set Si), those beginning at time t and ending at time D (set S 2 ), and finally 
those beginning at time and ending at time D (set S3). Tasks in S3 execute during the whole 
schedule duration, at constant speed, hence their contribution to the total power consumption P(t) 
is the same at each time-step t. Therefore, we can suppress them from the schedule without loss 
of generality. Next we determine the value of t . Let A\ = J^Ti&Si w i> m ^ ^2 = St,g5 2 w i' ^ ne 
energy consumption between and to is 4r, and between t and D, it is , 2 . The optimal energy 
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consumption is obtained with t = 1 1 t . Then, the total power consumption of the optimal 

schedule is the same in both intervals, hence at each time-step: we derive that P{t) = 
which is constant. 

Suppose now that t\ < t 2 . For each task T i5 let w[ be the number of operations executed 
before £1, and w" the number of operations executed after ti (with w[ + w" = Wi). Let G' be the 
DAG G with execution costs w[, and G" be the DAG G with execution costs w". The tasks with a 
cost equal to are removed from the DAGs. Then, both G' and G" have strictly fewer than n tasks. 
We can therefore apply the induction hypothesis. We derive that the power consumption in both 
DAGs is constant. Since we did not change the speeds of the tasks, the total power consumption 
P(t) in G is the same as in G' if t < t\, hence a constant. Similarly, the total power consumption 
P(t) in G is the same as in G" if t > t\, hence a constant. Considering the same partitioning with 
t 2 instead of t\, we show that the total power consumption P{t) is a constant before t 2 , and also 
a constant after t 2 . But ti < t 2 , and the intervals [0, t 2 ] and [ti, D] overlap. Altogether, the total 
power consumption is the same constant throughout [0,1?], which concludes the proof. □ 




5 Discrete models 



In this section, we present complexity results on the three energy models with a finite number of 
possible speeds. The only polynomial instance is for the Vdd-Hopping model, for which we 
write a linear program in Section I5TI Then, we give NP-completeness results in Section I5?2l and 
approximation results in Section 1531 for the DISCRETE and INCREMENTAL models. 

5.1 The Vdd-Hopping model 

Theorem 5. With the Vdd-Hopping model, MinEnergy(C7, D) can be solved in polynomial 
time. 

Proof. Let G be the execution graph of an application with n tasks, and D a deadline. Let s\ , .. ., s m 
be the set of possible processor speeds. We use the following rational variables: for 1 < i < n 
and I < j < m, bi is the starting time of the execution of task Tj, and auj) is the time spent at 
speed Sj for executing task Tj. There axen + nxm = n(m + l) such variables. Note that the total 
execution time of task Tj is Y^jLi a (i,j)- The constraints are: 

• VI < i < n, bi > 0: starting times of all tasks are non-negative numbers; 

• VI < i < n, bi + YyjLi a (i,j) — D: the deadline is not exceeded by any task; 

• VI < < n such that Tj — > Tj/, ti + Y^j=i a (i,j) — a t as ^ cannot start before its 
predecessor has completed its execution; 

• VI < i < n, YlT=i a (i,j) x s j — w i'- tas k Ti is completely executed. 
The objective function is then min f^ILi X]j=i a (i ,j) s fj • 

The size of this linear program is clearly polynomial in the size of the instance, all n{m + 1) 
variables are rational, and therefore it can be solved in polynomial time 11301 . □ 
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5.2 NP-completeness results 



Theorem 6. With the Incremental model (and hence the Discrete model), 
MinEnergy(C7, D) is NP -complete. 

Proof. We consider the associated decision problem: given an execution graph, a deadline, and 
a bound on the energy consumption, can we find an execution speed for each task such that the 
deadline and the bound on energy are respected? The problem is clearly in NP: given the execution 
speed of each task, computing the execution time and the energy consumption can be done in 
polynomial time. 

To establish the completeness, we use a reduction from 2-Partition [fTTTl . We consider an in- 
stance Xi of 2-Partition: given n strictly positive integers Oil, ■ ■ ■ i &m 

does there exist a subset I of 

{1, . . . , n} such that £\ £J a t = Oj? Let T = \ J™ =1 a*. 

We build the following instance X 2 of our problem: the execution graph is a linear chain with 
n tasks, where: 

• task Tj has size Wi — af, 

• the processor can run at m = 2 different speeds; 

• Si — 1 and s 2 = 2, (i.e., s m i n = 1, s max =2,5 = 1); 

• L — 3T/2; 

• E = 5T. 

Clearly, the size of X 2 is polynomial in the size of T x . 

Suppose first that instance X x has a solution I. For all i 6 J, Tj is executed at speed 1, otherwise 
it is executed at speed 2. The execution time is then ai + Eitfj a «/2 — §T = D, and the 
energy consumption is E = a i + a« x 2 2 = 5T = E. Both bounds are respected, and 
therefore the execution speeds are a solution to X 2 . 

Suppose now that X 2 has a solution. Since we consider the Discrete and Incremental 
models, each task run either at speed 1, or at speed 2. Let I = {% \ Tj is executed at speed 1}. Note 
that we have a { = 2T - £\ eJ a { . 

The execution time is D' = ^2 ieI ctj + a-i/2 = T + (^2 ieI a»)/2. Since the deadline is not 
exceeded, D' < D = 3T/2, and therefore J2iei &i<T. 

For the energy consumption of the solution of X 2 , we have E' = J2iei a * + a { x 2 2 = 
2T + 3 J2i<jti a i- Since E' <E = 5T, we obtain 3 a * < 3T ' and hence E^/ °i < T - 

Since Eie/ fl j + Ei^/ a i = 2T, we conclude that Eiei a * = Ej#j a i = T, and therefore Xi has 
a solution. This concludes the proof. □ 

5.3 Approximation results 

Here we explain, for the Incremental and Discrete models, how the solution to the NP-hard 
problem can be approximated. Note that, given an execution graph and a deadline, the optimal 
energy consumption with the Continuous model is always lower than that with the other models, 
which are more constrained. 

Theorem 7. With the Incremental model, for any integer K > 0, the MinEnergy (G, D) 
problem can be approximated within a factor (1 H — — ) 2 (1 + -h) 2 , in a time polynomial in the size 

s min 

of the instance and in K. 
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Proof. Consider an instance X inc of the problem with the Incremental model. The execu- 
tion graph G has n tasks, D is the deadline, 5 is the minimum permissible speed increment, and 
Smin, s max are the speed bounds. Moreover, let K > be an integer, and let E inc be the optimal 
value of the energy consumption for this instance X inc . 

We construct the following instance X vd d with the Vdd-Hopping model: the execution graph 
and the deadline are the same as in instance X inc , and the speeds can take the values 




0<i<iV 

where N is such that s max is not exceeded: N = I (la(s max ) — ln(s m ,„))/ In (l + -^)J . As iV is 
asymptotically of order 0(K\n(s max )), the number of possible speeds in I v dd, and hence the size 
of X vd d, is polynomial in the size of X inc and K. 

Next, we solve X vdd in polynomial time thanks to Theorem [5] For each task T i5 let sf be 
the average speed of Tj in this solution: if the execution time of the task in the solution is di, 
then s\ v = Wi/di, E v dd is the optimal energy consumption obtained with these speeds. Let 
g (aigo) _ m [ Uu { Smin + ux5\ux5> s[ vdd ^} be the smallest speed in X inc which is larger 
than s>" . There exists such a speed since, because of the values chosen forX vdd , s\ vdd ^ < s max . 
The values s[ algo ^ can be computed in time polynomial in the size of X inc and K. Let E a i go be the 
energy consumption obtained with these values. 

In order to prove that this algorithm is an approximation of the optimal solution, we need to 
prove that E algo < (1 + ^) 2 (1 + ±f x E inc . For each task T it s[ algo) - 5 < < sf l 9 ° ] . 



Since s min < sf dd \ we derive that sf l9oS> < sf dd ^ x (1 H — —). Summing over all tasks, we get 

Next, we bound E vdd thanks to the optimal solution with the CONTINUOUS model, E con . Let X ( 
be the instance where the execution graph G, the deadline D, the speeds s m i n and s max are the 
same as in instance X inc , but now admissible speeds take any value between s min and s max . Let 
s ( con ) k e pti ma i continuous speed for task Tj, and let < u < N be the value such that: 



con 



{i + jt) u <s { r n) < Smm x{i+±) 



i 



In order to bound the energy consumption for I vdd , we assume that Tj runs at speed s*, instead 
of s\ vdd . The solution with these speeds is a solution to I v dd, and its energy consumption is 
E* > E vdd . From the previous inequalities, we deduce that s* < sf on) x (l + i), and by 
summing over all tasks, 

E vdd <E* = «) 2 < Ei^i (4 C ° n) x (1 + ^)) 2 < E con x (1 + ±f < E inc x (1 + i) 2 . 

□ 

Proposition 3. 

• For any integer 5 > 0, any instance o/MinEnergy^, D) with the CONTINUOUS model 
can be approximated within a factor (1 H — —) 2 in the Incremental model with speed 
increment S. 
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• For any integer K > 0, any instance o/'MinEnergy(G, D) with the Discrete model can 
be approximated within a factor (1 + ^-) 2 (1 + -^) 2 , with a = maxi< i<m {,s i+ i — Sj}, in a 
time polynomial in the size of the instance and in K. 

Proof. For the first part, let s[ con ^ be the optimal continuous speed for task T{ in instance X con ; 
E con is the optimal energy consumption. For any task Tj, let Sj be the speed of X inc such that 

Si — 5 < s c ° n < Si. Then, s\ con ^ < s, x ^1 + j^ - )- Let E be the energy with speeds Sj. 
E con < Ex(l + .Let E inc be the optimal energy of X inc . Then, E con < E inc x ( 1 + -^-) . 

For the second part, we use the same algorithm as in Theorem The same proof leads to the 
approximation ratio with a instead of 5. □ 



6 Conclusion 

In this paper, we have assessed the tractability of a classical scheduling problem, with task pre- 
allocation, under various energy models. We have given several results related to CONTINUOUS 
speeds. However, while these are of conceptual importance, they cannot be achieved with physical 
devices, and we have analyzed several models enforcing a bounded number of achievable speeds, 
a.k.a. modes. In the classical Discrete model that arises from DVFS techniques, admissible 
speeds can be irregularly distributed, which motivates the Vdd-Hopping approach that mixes two 
consecutive modes optimally. While computing optimal speeds is NP-hard with discrete modes, it 
has polynomial complexity when mixing speeds. Intuitively, the Vdd-Hopping approach allows 
for smoothing out the discrete nature of the modes. An alternate (and simpler in practice) solution 
to Vdd-Hopping is the Incremental model, where one sticks with unique speeds during task 
execution as in the Discrete model, but where consecutive modes are regularly spaced. Such a 
model can be made arbitrarily efficient, according to our approximation results. 

Altogether, this paper has laid the theoretical foundations for a comparative study of energy 
models. In the recent years, we have observed an increased concern for green computing, and 
a rapidly growing number of approaches. It will be very interesting to see which energy-saving 
technological solutions will be implemented in forthcoming future processor chips! 
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